ProjecTILs case study - MC38 TILs
In this case study, we will build an integrated scRNA-seq analysis workflow to interpret the transcriptional and clonal structure of tumor-infiltrating T cells in MC38 colon adenocarcinoma (data from Xiong et al 2019).
The main R packages and methods employed in this workflow are:
- Seurat - for storing, processing and visualizing scRNA-seq data
- ProjecTILs - for the projection of scRNA-seq data into a reference TIL atlas
- scRepertoire - for the analysis of TCR-seq data
Note that scRepertoire requires R version 4.0 or higher - you will need to have R v.4 installed to run this case study.
R Environment
Check & load R packages
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS = "true")
if (!requireNamespace("renv")) install.packages("renv")
library(renv)
renv::restore()
# Load development version of ProjecTILs (compatible with R v.4)
remotes::install_git("https://gitlab.unil.ch/carmona/ProjecTILs.git", ref = "v0.4.1")
library(scRepertoire)
library(ProjecTILs)
library(gridExtra)
library(ggplot2)
library(plotly)scRNA-seq data preparation
Download scRNA-seq data from Array Express (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-7919/) After download and unpacking (you will need curl and unzip), you should get three files: matrix.mtx, genes.tsv and barcodes.tsv
files <- c("E-MTAB-7919.processed.1.zip", "E-MTAB-7919.processed.2.zip", "E-MTAB-7919.processed.3.zip")
matrix_dir <- "./input/Xiong_TIL/matrix"
system(sprintf("mkdir -p %s", matrix_dir))
for (i in 1:length(files)) {
data_path <- sprintf("https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-7919/%s",
files[i])
system(sprintf("curl %s --output %s/data.zip", data_path, matrix_dir))
system(sprintf("unzip -o %s/data.zip -d %s", matrix_dir, matrix_dir))
}
system(sprintf("rm %s/data.zip", matrix_dir))Load scRNA-seq data and store as Seurat object
projectID <- "Xiong_TIL"
libIDtoSampleID <- c("Mouse 1", "Mouse 2", "Mouse 3", "Mouse 4")
names(libIDtoSampleID) <- 4:7
exp_mat <- Read10X(matrix_dir)
querydata <- CreateSeuratObject(counts = exp_mat, project = projectID, min.cells = 3,
min.features = 50)
querydata$Sample <- substring(colnames(querydata), 18)
table(querydata$Sample)
4 5 6 7
1759 1746 1291 1386
querydata$SampleLabel <- factor(querydata$Sample, levels = c(4:7), labels = libIDtoSampleID)
table(querydata$SampleLabel)
Mouse 1 Mouse 2 Mouse 3 Mouse 4
1759 1746 1291 1386
scTCR data preparation
Download scTCR data from Array Express (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-7918/)
data_path <- "https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-7918/E-MTAB-7918.processed.1.zip"
tcr_dir <- "./input/Xiong_TIL/TCR"
system(sprintf("mkdir -p %s", tcr_dir))
system(sprintf("curl %s --output %s/data.zip", data_path, tcr_dir))
system(sprintf("unzip -o %s/data.zip -d %s", tcr_dir, tcr_dir))Mouse 1 to 4 (sample ID 4 to 7) correspond to TCR-seq libraries 35 to 38
libIDtoSampleID_VDJ <- 4:7
names(libIDtoSampleID_VDJ) <- 35:38
vdj.list <- list()
for (i in 1:length(libIDtoSampleID_VDJ)) {
s <- names(libIDtoSampleID_VDJ)[i]
vdj.list[[i]] <- read.csv(sprintf("%s/filtered_contig_annotations_%s.csv", tcr_dir,
s), as.is = T)
# Rename barcodes to match scRNA-seq suffixes
vdj.list[[i]]$barcode <- sub("\\d$", "", vdj.list[[i]]$barcode)
vdj.list[[i]]$barcode <- paste0(vdj.list[[i]]$barcode, libIDtoSampleID_VDJ[i])
vdj.list[[i]]$raw_clonotype_id <- paste0(vdj.list[[i]]$raw_clonotype_id, "-",
libIDtoSampleID_VDJ[i])
vdj.list[[i]]$SampleLabel <- libIDtoSampleID_VDJ[i]
}Combine alpha and beta chains using the combineTCR function from scRepertoire
# Using parameters removeNA=T and removeMulti=T will remove cells with multiple
# a-b combinations
combined <- combineTCR(vdj.list, samples = libIDtoSampleID_VDJ, ID = names(libIDtoSampleID_VDJ),
cells = "T-AB", removeNA = T, removeMulti = T)
for (i in seq_along(combined)) {
combined[[i]] <- stripBarcode(combined[[i]], column = 1, connector = "_", num_connects = 3)
}The function ‘combineExpression’ of scRepertoire allows incorporating clonotype information to a Seurat object, and creates the categorical variable ‘cloneTypes’ discretizing frequencies of the clonotypes
We have now paired expression and TCR data for the query samples, and loaded them into a unified Seurat object. We can proceed to project the data onto the reference atlas.
ProjecTILs
Load the reference atlas
[1] "Loading Default Reference Atlas..."
[1] "/Users/mass/Documents/Projects/Github/ProjecTILs_CaseStudies/ref_TILAtlas_mouse_v1.rds"
[1] "Loaded Reference map ref_TILAtlas_mouse_v1"
Project query data (with clonotype information stored as metadata) onto the TIL reference atlas
[1] "Using assay RNA for query"
celltype.pred
CD4T CD8T Non-Tcell Tcell_unknown Treg
467 3115 431 322 1739
unknown
108
CD8Tstate
Naive EffectorMemory MemoryLike Exhausted unknown
180 754 570 1510 101
[1] "539 out of 6182 ( 9 % ) non-pure T cells removed. Use filter.cells=FALSE to avoid pre-filtering (NOT RECOMMENDED)"
[1] "Aligning query to reference map for batch-correction..."
Projecting corrected query onto Reference PCA space
[1] "Projecting corrected query onto Reference UMAP space"
Visualization of projected data.
T cells are projected mostly in the Treg, CD8 Terminal exhausted (CD8_Tex) and Precursor exhausted (CD8_Tpex) areas, with some clusters in the T-helper (Th1) and CD8 Effector memory areas, and to a lesser extent in Naive-like and CD8 Early-Activated areas.
p1 <- plot.projection(ref)
p2 <- plot.projection(ref, query.projected)
grid.arrange(p1, p2, ncol = 2)Visualize the projections per sample. Broadly, the distribution across the atlas is similar for the four mice, with some variation in the frequency of Effector Memory T cells.
plots <- list()
sample_names <- unique(query.projected$SampleLabel)
for (sample_i in seq_along(sample_names)) {
sample <- sample_names[sample_i]
plots[[sample_i]] <- plot.projection(ref, query.projected[, query.projected$SampleLabel ==
sample]) + ggtitle(sample)
}
grid.arrange(grobs = plots, ncol = 2)Classify projected T cells into cell subtypes/states
query.projected <- cellstate.predict(ref = ref, query = query.projected)
table(query.projected$functional.cluster) #Cell state assignment is stored in the 'functional.cluster' metadata field
CD4_NaiveLike CD8_EarlyActiv CD8_EffectorMemory CD8_NaiveLike
31 109 575 261
CD8_Tex CD8_Tpex Tfh Th1
1752 436 10 541
Treg
1928
Look at distribution of T cells in terms of cell states.
We can check the gene expression profile of the cells assigned to each state (yellow), and compare them to those of the reference states (black).
For example, cells projected into Tex and Tpex region express Tox and Pdcd1, while only Tex express Gzmb and Havcr2, as expected.
Clonality analysis
Now, let’s see where the expanded clones are located within the map:
levs <- c("Hyperexpanded (100 < X <= 500)", "Large (20 < X <= 100)", "Medium (5 < X <= 20)",
"Small (1 < X <= 5)", "Single (0 < X <= 1)", NA)
palette <- colorRampPalette(c("#FF4B20", "#FFB433", "#C6FDEC", "#7AC5FF", "#0348A6"))
query.projected$cloneType <- factor(query.projected$cloneType, levels = levs)
DimPlot(query.projected, group.by = "cloneType") + scale_color_manual(values = c(palette(5)),
na.value = "grey")Larger (most highly expanded) clones are concentrated in the Tex/Tpex area. This is largely explained by the fact that sustained antigenic stimulation drives immunodominant clones into the Tox+-driven exhausted lineage (Tex - Tpex region)
Let’s check the raw numbers
CD4_NaiveLike CD8_EarlyActiv
Hyperexpanded (100 < X <= 500) 0 0
Large (20 < X <= 100) 0 1
Medium (5 < X <= 20) 0 2
Small (1 < X <= 5) 0 5
Single (0 < X <= 1) 23 42
CD8_EffectorMemory CD8_NaiveLike CD8_Tex
Hyperexpanded (100 < X <= 500) 1 0 86
Large (20 < X <= 100) 99 0 405
Medium (5 < X <= 20) 47 1 313
Small (1 < X <= 5) 91 5 216
Single (0 < X <= 1) 129 123 138
CD8_Tpex Tfh Th1 Treg
Hyperexpanded (100 < X <= 500) 31 0 0 3
Large (20 < X <= 100) 98 0 0 8
Medium (5 < X <= 20) 68 1 27 121
Small (1 < X <= 5) 54 2 118 225
Single (0 < X <= 1) 32 3 170 287
Plot clonal size by functional cluster
meta <- melt(table(query.projected@meta.data[!is.na(query.projected@meta.data$Frequency),
c("functional.cluster", "cloneType")]), varnames = c("functional.cluster", "cloneType"))
ggplot(data = meta, aes(x = functional.cluster, y = value, fill = cloneType)) + geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) + scale_fill_manual(values = c(palette(5)),
na.value = "grey")This confirms the intial observation that Tex and Tpex present the largest clonal expansion
We can highlight specific clonotypes on the reference atlas (here those with at least 40 cells):
clone_call = "CTaa" #select column/variable to use as clonotypes ID, in this case CTaa, the paired TCR CDR3 aminoacid sequences
cutoff <- 35 #Min cells for clonotype
clonotypeSizes <- sort(table(query.projected[[clone_call]])[table(query.projected[[clone_call]]) >
cutoff], decreasing = T)
bigClonotypes <- names(clonotypeSizes)
plots <- list()
for (i in 1:length(bigClonotypes)) {
ctype <- bigClonotypes[i]
plots[[i]] <- plot.projection(ref, query.projected[, which(query.projected[[clone_call]] ==
ctype)]) + ggtitle(sprintf("%s - size %i", ctype, clonotypeSizes[ctype]))
}
grid.arrange(grobs = plots, ncol = 2)The majority of clones tend to span Tex and Tpex states. Indeed, Tcf7+ precursor exhausted Tpex cells self-renew and give rise to more differentiated Tex effector cells.
The clonal overlap (Morisita similarity index) implemented in scRepertoire confirms this pattern:
meta.list <- expression2List(query.projected, group = "functional.cluster")
clonalOverlap(meta.list, cloneCall = "gene", method = "morisita") + theme(axis.text.x = element_text(angle = 90,
hjust = 1, vjust = 0.5))We can also visualize overlap of the top CD8 clones, and their overlap across functional clusters, using scRepertoire’s alluvial plot
compareClonotypes(meta.list, numbers = 6, samples = c("CD8_EffectorMemory", "CD8_Tpex",
"CD8_Tex", "CD8_NaiveLike", "CD8_EarlyActiv"), cloneCall = "aa", graph = "alluvial") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))Conclusions
Projection of the MC38 TILs onto the reference atlas showed that these T cell samples mostly consist of exhausted (CD8_Tex), precursor-exhausted CD8 T cells (CD8_Tpex), and Tregs, typical of highly immunogenic, “hot” tumors. Also, the majority of expanded clones were found spanning the CD8_Tex and CD8_Tpex states. This is expected as sustained antigen stimulation in the tumor drives immuno-dominant tumor-reactive clones towards the exhaustion differentiation path.
The combination of ProjecTILs and scRepertoire simplifies the joint analysis of single-cell expression data and clonotype analysis, in the context of an annotated reference atlas of TIL states.
Further reading
Original publication - Xiong et al. (2019) Cancer Immunol Res
ProjecTILs case studies - INDEX - Repository